Documentation Index
Fetch the complete documentation index at: https://docs.platform.qubrid.com/llms.txt
Use this file to discover all available pages before exploring further.
DeepSeek · Chat / LLM · 671B Parameters (37B Active) · 128K Context

Streaming Reasoning Chain-of-Thought Code JSON Output Long Context
Overview
DeepSeek R1-0528 is the May 2025 update to the original DeepSeek-R1 — built on the DeepSeek-V3 backbone with 671B total parameters and 37B active per inference pass via Sparse MoE. It achieves performance on par with OpenAI o1, with key improvements including 87.5% on AIME 2025 (up from 70%), reduced hallucinations, enhanced front-end capabilities, and newly added JSON output and function calling support. With chain-of-thought reasoning traces and MIT licensing, R1-0528 is one of the most capable open-source reasoning models available today — served instantly via the Qubrid AI Serverless API.
🧠 671B total / 37B active — frontier reasoning at MoE efficiency.
Deploy via Qubrid AI with no infrastructure required.
Model Specifications
| Field | Details |
|---|
| Model ID | deepseek-ai/DeepSeek-R1-0528 |
| Provider | DeepSeek |
| Kind | Chat / LLM |
| Architecture | DeepSeek-V3 backbone — Sparse MoE with 671B total / 37B active, MLA attention, MTP speculative decoding |
| Parameters | 671B total (37B active per inference pass) |
| Context Length | 128,000 Tokens |
| MoE | No |
| Release Date | May 2025 |
| License | MIT |
| Training Data | Large-scale diverse dataset; post-trained with RL (GRPO) for enhanced reasoning depth |
| Function Calling | Not Supported |
| Image Support | N/A |
| Serverless API | Available |
| Fine-tuning | Coming Soon |
| On-demand | Coming Soon |
| State | 🟢 Ready |
Pricing
💳 Access via the Qubrid AI Serverless API with pay-per-token pricing. No infrastructure management required.
| Token Type | Price per 1M Tokens |
|---|
| Input Tokens | $0.90 |
| Input Tokens (Cached) | $0.28 |
| Output Tokens | $3.20 |
Quickstart
Prerequisites
- Create a free account at platform.qubrid.com
- Generate your API key from the API Keys section
- Replace
QUBRID_API_KEY in the code below with your actual key
⚠️ Temperature note: Keep temperature in the 0.5–0.7 range (default 0.6) to prevent repetitive outputs. Values outside this range may degrade reasoning quality.
Python
from openai import OpenAI
# Initialize the OpenAI client with Qubrid base URL
client = OpenAI(
base_url="https://platform.qubrid.com/v1",
api_key="QUBRID_API_KEY",
)
# Create a streaming chat completion
stream = client.chat.completions.create(
model="deepseek-ai/DeepSeek-R1-0528",
messages=[
{
"role": "user",
"content": "Explain quantum computing in simple terms"
}
],
max_tokens=16384,
temperature=0.6,
top_p=0.95,
stream=True
)
# If stream = False comment this out
for chunk in stream:
if chunk.choices and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
print("\n")
# If stream = True comment this out
print(stream.choices[0].message.content)
JavaScript
import OpenAI from 'openai';
// Initialize the OpenAI client with Qubrid base URL
const client = new OpenAI({
baseURL: 'https://platform.qubrid.com/v1',
apiKey: 'QUBRID_API_KEY',
});
// Create a streaming chat completion
const stream = await client.chat.completions.create({
model: 'deepseek-ai/DeepSeek-R1-0528',
messages: [
{
"role": "user",
"content": "Explain quantum computing in simple terms"
}
],
max_tokens: 16384,
temperature: 0.6,
top_p: 0.95,
stream: true
});
// If stream = false comment this out
for await (const chunk of stream) {
if (chunk.choices[0]?.delta?.content) {
process.stdout.write(chunk.choices[0].delta.content);
}
}
console.log('\n');
// If stream = true comment this out
console.log(stream.choices[0].message.content);
package main
import (
"bufio"
"bytes"
"encoding/json"
"fmt"
"net/http"
)
func main() {
url := "https://platform.qubrid.com/v1/chat/completions"
data := map[string]interface{}{
"model": "deepseek-ai/DeepSeek-R1-0528",
"messages": []map[string]interface{}{
{
"role": "user",
"content": "Explain quantum computing in simple terms",
},
},
"temperature": 0.6,
"max_tokens": 16384,
"stream": true,
"top_p": 0.95,
}
jsonData, _ := json.Marshal(data)
req, _ := http.NewRequest("POST", url, bytes.NewBuffer(jsonData))
req.Header.Set("Authorization", "Bearer QUBRID_API_KEY")
req.Header.Set("Content-Type", "application/json")
client := &http.Client{}
res, _ := client.Do(req)
defer res.Body.Close()
scanner := bufio.NewScanner(res.Body)
for scanner.Scan() {
line := scanner.Text()
if line != "" {
fmt.Println(line)
}
}
}
cURL
curl -X POST "https://platform.qubrid.com/v1/chat/completions" \
-H "Authorization: Bearer QUBRID_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-ai/DeepSeek-R1-0528",
"messages": [
{
"role": "user",
"content": "Explain quantum computing in simple terms"
}
],
"temperature": 0.6,
"max_tokens": 16384,
"stream": true,
"top_p": 0.95
}'
Live Example
Prompt: Explain quantum computing in simple terms
Response:
<think>
The user wants a simple explanation of quantum computing.
I should cover qubits, superposition, entanglement, and practical
applications without overwhelming technical detail.
</think>
Quantum computing — simplified:
Classical computers store information as bits: 0 or 1.
Quantum computers use qubits, which can be 0, 1, or both simultaneously
(superposition) — like a coin spinning before it lands.
This lets quantum computers evaluate many possible solutions at once,
rather than sequentially — making them exponentially faster for
specific problem types.
Two key quantum properties:
1. Superposition — a qubit exists in multiple states at once until measured
2. Entanglement — two qubits can be linked so the state of one
instantly determines the state of the other,
enabling coordinated computation at massive scale
Where quantum computing matters most:
- Cryptography (breaking and building encryption systems)
- Drug discovery (simulating molecular structures)
- Optimization (logistics, financial portfolio modeling)
- Machine learning acceleration
Important caveat: quantum computers aren't replacing laptops.
They're precision instruments for problems that would take classical
supercomputers millions of years to solve.
Try it yourself in the Qubrid AI Playground →
Playground Features
The Qubrid AI Playground lets you interact with DeepSeek R1-0528 directly in your browser — no setup, no code, no cost to explore.
🧠 System Prompt
Define the model’s role, output format, and reasoning constraints before the conversation begins. Particularly powerful for structured reasoning tasks and JSON output workflows.
Example: "You are an expert mathematician. Work through every problem
step by step, show all intermediate reasoning, and always verify
your answer before presenting it."
Set your system prompt once in the Qubrid Playground and it applies across every turn of the conversation.
🎯 Few-Shot Examples
Guide the model’s reasoning style and output format with concrete examples — especially effective for complex structured tasks.
| User Input | Assistant Response |
|---|
What is the derivative of x³ + 2x² - 5x + 1? | Step 1: Apply power rule to each term. d/dx(x³) = 3x², d/dx(2x²) = 4x, d/dx(-5x) = -5, d/dx(1) = 0. Result: 3x² + 4x - 5 |
Debug this Python function: def add(a, b): return a - b | Bug found: The operator is subtraction (-) but the function name implies addition. Fix: return a + b |
💡 Stack multiple few-shot examples in the Qubrid Playground to guide reasoning depth, chain-of-thought format, and output structure — no fine-tuning required.
Inference Parameters
| Parameter | Type | Default | Description |
|---|
| Streaming | boolean | true | Enable streaming responses for real-time output |
| Temperature | number | 0.6 | Recommended range 0.5–0.7 to prevent endless repetitions |
| Max Tokens | number | 16384 | Maximum number of tokens to generate |
| Top P | number | 0.95 | Nucleus sampling: considers tokens with top_p probability mass |
Use Cases
- Advanced mathematical reasoning
- Code generation and debugging
- Complex multi-step problem solving
- Research and analysis
- JSON-structured output generation
- Function calling and tool use
Strengths & Limitations
| Strengths | Limitations |
|---|
| 671B total / 37B active MoE — frontier reasoning at high efficiency | 128K max context (shorter than some competitors) |
| 87.5% on AIME 2025 — up +17.5% from previous version | Requires very large infrastructure for self-hosting |
| Supports JSON output and function calling | Temperature must stay in 0.5–0.7 range |
| Reduced hallucinations vs prior R1 | Reasoning traces increase total output length |
| Fully open-source with MIT license | |
| Chain-of-thought reasoning with visible traces | |
Why Qubrid AI?
- 🚀 No infrastructure setup — 671B MoE served serverlessly, pay only for what you use
- 🔁 OpenAI-compatible — drop-in replacement using the same SDK, just swap the base URL
- 💰 Cached input pricing — $0.28/1M for cached tokens, dramatically reducing costs on repeated context
- 🧪 Built-in Playground — prototype with system prompts and few-shot examples instantly at platform.qubrid.com
- 📊 Full observability — API logs and usage tracking built into the Qubrid dashboard
- 🌐 Multi-language support — Python, JavaScript, Go, cURL out of the box
Resources
Built with ❤️ by Qubrid AI
Frontier models. Serverless infrastructure. Zero friction.